## 10G BASE-T System Overview

Member: Chien, Ying-Ren

Lee, Jan-Hwa

Advisor: Tsao, Hen-Wai

Date : August 24, 2005

### Outline

- Introduction
  - 10G connectivity options
  - System overview
  - System requirement
- Channel Models
- Coding & Modulation
- Equalization
- Timing recovery
- Transmitter front end solutions
- Start-up protocol
- References

## 10G Connectivity Options

| Standard    | Technology                       | Media                                    | Reach   |
|-------------|----------------------------------|------------------------------------------|---------|
| 10GBASE-SR  | 850nm                            | 62.5μm MMF                               | 2-33m   |
|             | 10.3G Serial                     | 50μm MMF                                 | 2-300m  |
| 10GBASE-LX4 | 1300nm CWDM                      | 50/62.5μm MMF                            | 2-300m  |
|             | 4 X 3.125Gbps                    | 10μm SMF                                 | 2-10km  |
| 10GBASE-LRM | 850 & 1300nm<br>10.3G Serial EDC | 50/62.5μm MMF                            | .5-300m |
| 10GBASE-CX4 | 4 X 3.125Gbps                    | 8 Pair Shielded Cu<br>"InfiniBand Cable" | ≤15m    |
| 10GBASE-T   | 128DSQ w/FEC @                   | Cat 6                                    | 55-100m |
|             | 800 MHz                          | Cat 7                                    | 100m    |

### 10GBASE-T: When?

- 2002, Nov : Project Initiation
- 2003, Nov : Project Authorization
- 2004, July : Draft 1
- 2005, Mar: Working Group Ballot
- 2005, July : Draft 2
- 2006, Jan : Draft 3
- 2006, July: RevCom Standard Released

## IEEE 802.3an : 13 Objectives(1/2)

- Preserve Ethernet format size
- Preserve Ethernet frame size
- Support full duplex operation only ( No CSMA / CD )
- Will not support 802.3ah (EFM)
  - unidirectional operation EFM is half duplex
- Support a speed of 10Gbps
- Select copper media from ISO/IEC 11801:2002
- Support auto-negotiation
- Support coexistence with 802.3af
- Meet CISPR/FCC Class A

## IEEE 802.3an : 13 Objectives (2/2)

- Support star-wired LANs using structured cabling
- Support operation over 4-connector structured 4pair, twisted-pair copper cabling
- Define a single 10 Gb/sPHY support links of:
  - 100 m, four-pair Class F( CAT7 )
  - □ 55 m to 100 m, four-pair Class E (CAT 6)
  - 55m CAT 6 unshielded & 100m CAT 6 shielded
- Support a BER of 10^-12 on all supported distances and classes

# System Overview

#### 10G Ethernet v.s. OSI Ref Model



## 10GBASE-T System Block Diagram



## Major TX Path Blocks

#### PCS

- XGMII to 64B/65B scrambler to FEC (LDPC) which is mapped to 128DSQ which is then mapped to 4 channels
- Modulation
  - 16 Level Pulse Amplitude Modulation (16 PAM)
- Tomlinson-Harashima Precoding (THP)
  - Transmit equalization
  - Pre-compensates the signal based on knowledge of the channel
  - Suit for (quasi-)static channel

## Introduction of TH Precoding(1/3)

How to eliminate ISI?



Z(D) is a interger to make -M < X(D) < M

n(D)

Z transform

$$X(D) = d(D) + 2MZ(D) - X(D)[H(D) - 1]$$

■ Simplify →

$$X(D) = \frac{[d(D) + 2MZ(D)]}{H(D)} = \frac{Y(D)}{H(D)}$$

## Introduction of TH Precoding(2/3)

Receiver signal is

$$r(D) = X(D)H(D) + n(D)$$

$$= Y(D) + n(D)$$

$$= d(D) + 2MZ(D) + n(D)$$

so ISI be eliminated

After modulo-2M, then

$$\widehat{d}(D) = d(D) + n(D)$$

## Introduction of TH Precoding(3/3)



## Major RX Path Blocks

- Hybrid function
  - Enable bidirectional data transmission on a single pair
  - Stop the local transmitted signals from being mixed with the local received signals.
- A/D
  - □ 800MHz, ~ 10bit accuracy
- Echo/NEXT Canceller
- Matrix FFE
- LDPC Decoder
- 65B/64B Descrambler

## Receiver Path for Channel 0 (1/3)



## Receiver Path for Channel 0 (2/3)



## Receiver Path for Channel 0 (3/3)



# System Requirements

## Tentative 10GBASE-T Spec

| Modulation Code          | Baseband 128-DSQ (PAM-16)                                            |
|--------------------------|----------------------------------------------------------------------|
| Baud Rate                | ~ 800 MHz                                                            |
| FEC                      | LDPC(2048,1723) systematic                                           |
| Framing                  | 64B/65B                                                              |
| Transmitter Equalization | Tomlinson-Harashima Precoding                                        |
| Required SNR             | 23.4dB                                                               |
| Transmit Filter          | Spectral mask                                                        |
| Transmit Power           | 3.2dbm-5.2dbm (center at 4.2dbm) at MDI (Medium dependent interface) |
| Transmit Voltage         | 2V +/- 5% at MDI                                                     |
| Receiver Filter          | Low order CT                                                         |
| Receiver Equalization    | Adaptive FFE                                                         |
| MAC Interface            | XGMII                                                                |

## Tentative System Requirements

assume 15dB analog echo suppression

| Echo Suppression   | ~55dB            |
|--------------------|------------------|
| NEXT Suppression   | ~40dB            |
| FEXT Suppression   | ~25dB            |
| ADC/DAC resolution | 10bit            |
| Precoding function | ARMA(3,3)        |
|                    | 3 zeros, 3 poles |
| FFE span           | 64 taps          |

## Channel Models

## Channel Impairments

- Insertion Loss (I.L.)
- Near End Crosstalk ( NEXT )
- Far End Crosstalk (FEXT)
- Echo
- Alien Crosstalk

### Channel Model Data



### Insertion Loss



### NEXT





## FEXT/ELFEXT





### Echo

#### Near-end Echo / Far-end Echo





### Alien Crosstalk

Characterize as "power sum"





## Impairment Frequency Response



- Cable manufacturers guarantee cable performance using frequency domain "limit lines"
- No phase information
- Measurements needed to model in simulations



## Coding & Modulation

- DSQ vs PAM-12 modulation
- Coded modulation with LDPC
- 128-DSQ bit-mapping
- 128-DSQ soft De-mapping
- Coding and framing for 128-DSQ
- Performance of 128DSQ with LDPC(2048,1723)

### DSQ vs 12-PAM Modulatoin

- Using coded modulation
- 2D-constellations (DSQ-128 vs 12-PAM^2)



# BER for intra-subset (Uncoded bits)=> To choose minimum distance in a subset



128-DSQ: 
$$M=16$$
;  $E_x = (2M)^2/12$ ;  $\Delta_0^2 = 8$ 

$$\Delta_3^2 = 8\Delta_0^2 : \Pr(e)_{ISS-kv3} = \frac{1}{2} \times 4 \times Q\left(\frac{\Delta_3}{2\sigma_w}\right) : \Delta_4^2 = 16\Delta_0^2 : \Pr(e)_{ISS-kv4} = \frac{1}{2} \times 4 \times Q\left(\frac{\Delta_4}{2\sigma_w}\right)$$

### Coded-modulation with LDPC

- 12db set partition (Distance is  $4\Delta_0$  in a subset)
- 16 subset (Using 4 LDPC encoded bits to choose which subset )
- Every subset has 8 symbols (3 uncoded bits)



## -DSQ bit-mapping(1/2)



## 128-DSQ bit-mapping(2/2)

## 7-bit label two 16-PAM symbols $\begin{array}{c} \mathbf{u}_{1} \longrightarrow \begin{cases} \mathbf{x}_{1}^{3} = \overline{\mathbf{u}}_{1} \& \mathbf{u}_{3} \\ \mathbf{u}_{2} \longrightarrow \begin{cases} \mathbf{x}_{1}^{2} = \mathbf{u}_{1} \oplus \mathbf{u}_{3} \\ \mathbf{x}_{1}^{1} = \mathbf{c}_{1} \end{cases} \\ \mathbf{u}_{3} \longrightarrow \begin{cases} \mathbf{x}_{1}^{0} = \mathbf{c}_{1} \oplus \mathbf{c}_{2} \end{cases}$ $\begin{array}{c|c} + & y_1 & a_1 \in \{\pm 1, \pm 3, \dots \pm 15\} \\ + & \uparrow & \uparrow \end{array}$ $\begin{array}{c} c_2 \longrightarrow x_2^3 = (u_2 \& u_3) \lor (u_1 \& \overline{u}_2) \\ c_3 \longrightarrow x_2^2 = u_2 \oplus u_3 \\ x_2^1 = c_3 \end{array}$ -mod 16 -

# 128-DSQ Soft De-mapping for 4-coded bits

 Using two received 16-PAM symbol (One DSQ symbol) to generate four initial values for 4-coded bits used in LDPC decoding

$$llrb(x) = ln \frac{\sum_{k} exp(-[x-(4k+0)]^2/2\sigma^2) + exp(-[x-(4k+1)]^2/2\sigma^2)}{\sum_{k} exp(-[x-(4k+2)]^2/2\sigma^2) + exp(-[x-(4k+3)]^2/2\sigma^2)} \cong \frac{1}{\sigma^2} \begin{cases} x+0.5 : 0 \le x \le 0.5 \\ 1.5-x : 0.5 \le x \le 2.5 \\ x-3.5 : 2.5 \le x \le 4 \end{cases}$$

### Coding and framing for 128-DSQ

#### 128-DSQ modulation with LDPC (2048, 1723) code



Code Block: 1723 + 1536 = 3259 info bits ( 3.1826 bit/dim )



### Framing details:

### 128DSQ + LDPC(2048,1723)



### 128DSQ + LDPC(2048,1723)

performance

System require SNR 23.4 dB



- BER = 10<sup>-12</sup> at
   SNR = 23.32dB
- 5-bit look up table
- Final point at SNR = 23.35dB: 2.184e13 bits simulated with 1 block error
- 128DSQ mapping as described in ungerboeck\_1\_1104.pdf and ungerboeck\_2\_0904.pdf
- G and H matrices as defined in 802.3an public area

### Equalization

- Channel equalization via DFE ?
- Decoupling FEC from equalization (THP)
- Tx-based equalization
- Fundamental benefits of precoding
- Precoder adaptation ?
- Precoder architecture (IIR or FIR)?

### Channel equalization via DFE?

### DFE cannot separate from channel coding

- Error propagation substantially reduces coding gain
- Zero delay decisions irreconcilable with basic idea of channel coding



- Usually require placing some portion of decoder inside the feedback loop ( DFSE )
  - □ **DFSE** → combine Viterbi decoder with slicer to reduce decision error (Used in 1000Base-T)
  - Introduces a critical timing path
  - → limits max baud rate
  - Incompatible with high performance block or iterative codes

### Decoupling FEC from equalization

- Precoding is a well-known technique for decoupling channel equalization from channel coding
  - Move DFE from RX to TX
  - Necessary for LDPC or concatenated coding schemes



### Tx-Based Equalization

- Moveing postcusor equalization from receiver to the transmitter
- Achieveing similar performance as DFE with correct decision
  - Precoding feedback symbols are known, not estimated
  - Equalization is independent of channel coding performance



### Fundamental benefits of Precoding

- Permits more powerful channel codes required to meet 10Gbps
  - Decouples equalization from channel coding
- Retains asymptotic optimality of decision feedback equalization without error propagation
- Does not affect transmitted spectrum (EMI)
  - Does not preclude any form of transmit filtering
- Removes DFSE timing loop simplifies timing closure

### Precoder adaptation not necessary?

### Programmable precoding

- Precoder coefficients chosen at start-up to approximately match channel response
- Adaptive linear RX removes residual ISI



### Coefficients are a function of cable length

Pre-store in a small cable-length

# Number of Precoder coefficients reduced over 10x with IIR model

 Over all channel is accurately modeled by 2nd order IIR



46

# Timing recovery

- Loop Timing
- Multiphase VCO approach
- Digital interpolation approach

# Loop Timing

 Echo & NEXT cancellation generally require the transmitter and receiver to be clocked from the same source



# Traditional Analog/Digital Phase Compensation Concept



- optimal baud spaced samples are at pulse peak
- independently adjust phase for each channel

### 4-channel sampling approaches

- Appro. 1: independent PLL per channel
- Appro. 2 : single VCO with phase selector
- Appro. 3 : Digital interpolator instead of analog VCO or phase selector ( Free running clock for ADC )

### Independent PLL per channel



#### Theoretically possible but has implementation challenges

- ■Difficulties with multiple VCOs on the same die interactions almost inevitable
- •Injection locking has been shown to be the root cause failure mechanism in previous product

# Single VCO with phase selector (1/2)



- Phase-step cause echo cancellation errors
- 50-60 dB echo cancellation requires very large number of phases
  - → Number of 10G >> Number of 1G

# Single VCO with phase selector (2/2)

### Implementation challenge

High precision phase selector



- Latency (ADC+FFE) in PLL loop
- Low jitter requirement may constrain design options for ADC ,FFE , Slicer
- Trade off between jitter tolerance and latency
  - → Jitter tolerance requires PLL bandwidth to be increased
  - → Latency requires PLL bandwidth to be reduced

# Free running clock / Digital interpolation (1/3)



- ADC / FFE clock at the same free running clock
- Moving timing recovery after FFE

# Free running clock / Digital interpolation (2/3) [Advantage of this way]

- Remove latency restriction on ADC and FFE
  - -- open opportunity for design innovation

Free running clock removes requirement for high precision phase

selector

ADC , DAC , FFE ,
 Echo canceller all on
 the same clock
 remove need for FIFOs



Analog PLL per channel not required, single clock for 4 channel

# Free running clock / Digital interpolation (3/3) [Implementation challenge]

- Baud rate sampling with zero excess bandwidth
  - When ADC samples with free running clock, we need Nyquist rate sampling or oversampling to recover the signal
    - → Limits transmitted BW <= (1/2T) when samples at baud rate.
- Slave optionally transmits with recovered clock
  - Permits single free running clock at both end.



### Transmitter front-end solutions

- Linearity specification
- SNR margin vs TX voltage
- TX bandwidth -- Zero excess bandwidth is desired
- TX PSD proposal
- Transmitter front-end
  - → Digital oversampling(2X) filter vs.
    Analog transmitting filter

### Linearity specification

- Two measurement index
  - SFDR (dB) (Signal to Distortion Ration with single tone test)
  - IMD (dB) (Signal to Intermodulation Distortion Ratio with two tone test)
- Local transmitter's nonlinearity can limit the capacity of local receiver, in absence of nonlinear echo cancellation
- Two spec
  - "Recommended" spec (compliance not required):
     essentially a spec on nonlinearity of local transmitter
  - "Normative" spec (compliance required):
     ensure that the nonlinearity of a far end transmitter does not cause the receiver to lose "much SNR margin"

## SFDR & IMD linearity measurement

The SFDR of the transmitter when subject to single tone inputs producing output with peak to peak transmit amplitude shall be:

better than X<sub>nonlin</sub> dB in the frequency range, f & (0.1, f<sub>1</sub>] MHz, f<sub>1</sub> is in MHz

and better than  $[X_{nonlin}-X_{nlslope}*log10(f/f_1)]$  dB, for  $f \epsilon (f_1, 800/6]$  MHz.

The Signal to Intermodulation distortion ratio of the transmitter, for dual tone inputs, producing output with peak to peak transmit amplitude, shall be better than:

 $[X_{nonlin} +2.5-X_{nlslope}*log10(f/f_1)]$  dB for f  $\epsilon$  (800/6, 800/2]MHz



### Two linearity spec

 NORMATIVE SPEC PROPOSAL (compliance required): X<sub>nonlin</sub>=50dB, X<sub>nislope</sub>=20dB, f<sub>1</sub>=50MHz, for equations represented as general in clause 55.4, and repeated in slide 9



- For the "recommended" spec: Similarly, the SNR margin loss due to local transmitter linearity
  - 68dB causes 0.5dB SNR margin loss
  - 65dB causes ~1dB SNR margin loss.
- RECOMMENDED SPEC PROPOSAL (Compliance not required, just recommended): Keep the "recommended" spec X<sub>nonlin</sub>=65dB, as this is much harder to meet, especially since here X<sub>nlslope</sub>=0, f1=don't care(=400MHz)

### TX voltage vs. Receiver SNR margin



### TX Bandwidth

- Because of insertion loss, channel gain approaches zero when f > (1/2T), so excess bandwidth wastes signal energy.
- With zero excess bandwidth, we can use digital interpolation to recover timing with free running clock for ADC.
  - → Phase insensitive sampling at baud rate
- Zero excess bandwidth with spectrum null at DC and (1/2T) is desired

# TX PSD(1/3) [Some assumptions for PSD mask]

- PSD mask assumptions
  - Transformer 1st pole at ~100kHz
  - Transformer pole f1 with substantial tolerance of 750MHz +/-33%
  - Transmitter pole f2, "simple filter pole" contributed by the total capacitance at transmitter and 50ohms. This is modeled as 750MHz +/-33% tolerance
  - Transmitter and board "parasitic" pole f3 with substantial tolerance for different implementations, 1200MHz +/- 33%.
  - Sinc roll-off, contributing majority of the band limitation.
- Assume that the voltage on the line side of the transformer, after going in through the transformer Insertion loss (in addition to its bandwidth loss) is 2V +/-6%.
- 2V+/-6% peak to peak differential at the MDI
  - meets the power spec
  - 2V+/-6% spec is better for transmit and echo cancellation linearity.
     Linearity limits SNR margin.

# TX PSD (2/3) [ PSD in draft 2.1 ]



# TX PSD (3/3) [Teranetics' Proposal]



#### Recommendation:

Reduce the upper PSD by 1dB in 0-70MHz.

0.5dB reduction on upper and lower curves everywhere else w.r.t. draft 2.1, would make it better centered.

#### PSD upper curve:

-79 dBm/Hz, 0<f<=70 -79.5 dBm/Hz, 70<f<=150 -79.5-(f-150)/58 dBm/Hz, 150<f<=730 -79.5-(f-330)/40 dBm/Hz, 730<f<=1810 -116 dBm/Hz, 1810<f<3000

#### PSD lower curve:

-83.5 dBm/Hz, 5<f<=50 -83.5-(f-50)/50 dBm/Hz, 50<f<=200 -86.5 -(f-200)/25 dBm/Hz, 200<f<=400

Where f and the ranges are in MHz

# Transmitter front end – only analog TX filter (baseline approach)

 No digital filtering, T-spaced DAC, TX filter with frequency-dependent input impedance Zi and constant output impedance R



# Transmitter front end – Digital TX filter with analog filter (1/3) (oversampled approach)

Digital TX filtering & T/2-interpolation (Upsampling of 2X),
 (T/2) overlapping DAC, trivial front-end analog filter



# Transmitter front end – Digital TX filter with analog filter (2/3)=>Effect of (T/2) interpolation



# Transmitter front end – Digital TX filter with analog filter (3/3)=>Interpolation coefficients



### Transmitted PSD: Baseline approach



### Transmitted PSD: Oversampled approach



## Compare of two approaches

|                                    | "Baseline"<br>solution                                   | "Oversampled" solution                                      |
|------------------------------------|----------------------------------------------------------|-------------------------------------------------------------|
| Digital filters                    | none                                                     | (1-D <sup>2</sup> )/(1-0.75 D <sup>2</sup> ) + interpolator |
| DAC                                | 800 Ms/s                                                 | 1600 Ms/s                                                   |
| AFE filter                         | 1-st order RLC LPF, f <sub>3dB</sub> =300 MHz            | Trivial R//C                                                |
| rms and peak voltage at DAC output | higher rms,<br>peak similar to "oversampled"             | lower rms,<br>peak similar to "baseline"                    |
| Excess bandwidth                   | substantial (→ sampling phase<br>dependency in receiver) | sharp bandwidth limitation<br>(EMI advantage)               |
| Controlled spectral nulls          | none                                                     | dc and 1/2T                                                 |
| Return loss                        | ок                                                       | ок                                                          |
| Transmit PSD shape                 | depends on analog components                             | digitally defined                                           |

Higher PAR for "oversampled" is compensated by lower rms of "oversampled"

# Startup Protocols

## Three Operating Modes



### Sequenced Startup



## PMA Training Signals(1/2)

### Objective:

- Recover timing and adaptive filter coefficients
- Establish polarity correction, pair swap, pair deskew
- Master and slave use different sequences



# Unambiguous generation of PAM training sequences





#### Main PN sequence

#### Derived sequences

$$Sa_n = \begin{cases} Ser_n [0] \oplus 1 & \text{if } n \text{ mod } 256 = 0 \\ Ser_n [0] & \text{otherwise} \end{cases}$$

$$Sb_n = Ser_n [3] \oplus Ser_n [8]$$

$$Se_n = Ser_n [6] \oplus Ser_n [16]$$

$$Sd_n = Ser_n [9] \oplus Ser_n [14] \oplus Ser_n [19] \oplus Ser_n [24]$$

## PMA Training Signals(2/2)

 There is only one pair combination which satisfy all equations become 0

#### Polarity correction

$$Ry_n[x]^{\wedge}Ry_{n-13}[x]^{\wedge}Ry_{n-33}[x] = \begin{cases} 0 \text{ (polarity = OK)} \\ 1 \text{ (polarity = NG)} \end{cases} (x = 0,1,2,3)$$

 $Ry_n[x]$ : PAM2 demapping data of Lane x

#### Pair swap, deskew

$$Ry_n[x]^{\wedge}Ry_{n-3}[x-1]^{\wedge}Ry_{n-8}[x-1] = \begin{cases} 0 \text{ (skew = OK)} \\ 0/1 \text{ (skew = NG)} \end{cases} (x = 1,2)$$

if (remote side PMA status = NG)

$$Ry_n[3]^{\wedge}Ry_{n-3}[2]^{\wedge}Ry_{n-8}[2]^{\wedge}Ry_n[0] = \begin{cases} 0 \text{ (skew = OK)} \\ 0/1 \text{ (skew = NG)} \end{cases}$$

else

$$Ry_n[3]^{\wedge}Ry_{n-3}[2]^{\wedge}Ry_{n-8}[2]^{\wedge}Ry_n[1] = \begin{cases} 0 \text{ (skew = OK)} \\ 0/1 \text{ (skew = NG)} \end{cases}$$

### References

- [1] M. Hatamian et al., "Design considerations for gigabit Ethernet 1000Base-T twisted pair transceivers," Proc 1998 IEEE Custom Integrated Circuits Conf., pp. 335-342, May 1998.
- [2] Kamran Azadet, "Gigabit Ethernet over Unshielded Twisted Pair Cables," in Proc. Int. Symp. VLSI Technology, Systems, and Applications, Taipei, Taiwan, Jun. 1999, pp. 167-170.
- [3] Jingyu Huang and Richard R. Spencer "The Design of Analog Front Ends for 1000BASE-T Receivers "IEEE TRANSCATIONS ON CIRCUITS AND SYSTEMS –II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 50, NO. 10, OCTOBER 2003
- [4] <a href="http://www.ieee802.org/3/an/">http://www.ieee802.org/3/an/</a> (Mainly from Vendor : Broadcom)

# Back slides

### Concept of excess bandwidth

- From sampling theorem, no info. is lost with baud-rate sampling
- all sampling phase convey the same amount of timing information (interpolator)



### THP coefficients calculate



Decision - point SNR for given  $H_A(f)$ ,  $N_A(f)$ ,  $\tau$ , and precoding response h(D)

$$SNR_{mmse} = \left[ T \int_{-1/2T}^{1/2T} \left| h(D = e^{-j2\pi fT}) \right|^2 / \left( SNR_A^*(f) + 1 \right) df \right]^{-1}, \quad h(D) = 1 + \sum_{\ell=0}^{L} h_{\ell} D^{\ell}$$

For given 
$$\text{SNR}_{A}^{*}(f)\!+\!1, \text{ determine (arg)} \max_{h_1,h_2,\cdots\,h_L} \text{SNR}_{\text{mmse}}$$
 .